Towards Morphologically Annotated Corpus of Hospital Discharge Reports in Polish
نویسندگان
چکیده
The paper discuses problems in annotating a corpus containing Polish clinical data with low level linguistic information. We propose an approach to tokenization and automatic morphologic annotation of data that uses existing programs combined with a set of domain specific rules and vocabulary. Finally we present the results of manual verification of the annotation for a subset of data.
منابع مشابه
Some Remarks on Automatic Semantic Annotation of a Medical Corpus
In this paper we present arguments that elaborating a rule based information extraction system is a good starting point for obtaining a semantic annotated corpus of medical data. Our claim is supported by evaluation results of the automatic annotation of a corpus containing hospital discharge reports of diabetic patients.
متن کاملCorpus-based Semantic Relatedness for the Construction of Polish WordNet
The construction of a wordnet, a labour-intensive enterprise, can be significantly assisted by automatic grouping of lexical material and discovery of lexical semantic relations. The objective is to ensure high quality of automatically acquired results before they are presented for lexicographers’ approval. We discuss a software tool that suggests synset members using a measure of semantic rela...
متن کاملTowards 100M Morphologically Annotated Corpus of Tajik
The paper presents a work in progress: building morphologically annotated corpus of Tajik language of the size more than 100 million tokens. The corpus is and will be by far the largest available computer corpus of Tajik: even its current size is almost 85 million tokens. Because the available text sources are rather scarce, to achieve the goal also the texts of a lower quality have to be inclu...
متن کاملExperiments in Cross-Language Morphological Annotation Transfer
Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is...
متن کاملPortable Language Technology: a Resource-light Approach to Morpho-syntactic Tagging
Morpho-syntactic tagging is the process of assigning part of speech (POS), case, number, gender, and other morphological information to each word in a corpus. Morpho-syntactic tagging is an important step in natural language processing. Corpora that have been morphologically tagged are very useful both for linguistic research, e.g. finding instances or frequencies of particular constructions in...
متن کامل